Server availability reporting using asynchronous E-mail message

ABSTRACT

A method and system for reporting availability of a plurality of servers. The method comprises the steps of for each of said plurality of servers, periodically searching a log file for indication of a crash or shutdown or start, computing the time said crash or shutdown or start occurred, and sending said time and indication as an e-mail to a database on a server. The method further comprises the steps of calculating the duration of downtime for each said crash or said shutdown; for each server, sorting said e-mail by order received; and displaying for each of said plurality of servers, said duration of downtime in said order. For example, the sending step includes the step of sending said e-mail to a database on a pre-identified one server; and the sorting step includes the step of sorting said e-mail by order received by said pre-identified one server. Also, with a preferred embodiment, each of the plurality of servers includes an e-mail function; and the sending step includes the step of, each server using its e-mail function to send the time and indication to said database.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention generally relates to distributed data processing systems. More specifically, the invention relates to methods and systems for reporting the availability of servers in a distributed system.

[0003] 2. Prior Art

[0004] Some conventional data processing environments comprise a plurality of user terminals or work stations connected to a central host data processing system. Such data processing environments are typically referred as central or host environments. Increasing in popularity are distributed data processing environments in which user terminals or work stations are connected to plural server data processing systems.

[0005] In a typical distributed environment employing a distributed database management system, a group of administrators collectively perform operational tasks associated with management of servers such as Groupware and E mail servers. Both E Mail and Groupware applications usually generate megabytes of information during normal daily operation. The information is typically stored in a log format. The logs are preferably processed with a view to identifying error conditions and thus to eliminating or at least reducing application server failures. However, the processing of such logs is a laborious activity.

[0006] One particular problem associated with distributed database environments is to provide statistics relating to server availability for a large distributed system, for example one having more than 250 servers. The prior art tools use either data replication or polling to retrieve the necessary data, which either consume huge disk space and bandwidth or take too long for a large number of servers. More specifically, if data replication is used to retrieve the necessary data, then the same amount of data would need to be stored 250 times, and the same data would need to be transmitted through the network 250 times. This is a waste of substantial storage space and bandwidth.

SUMMARY OF THE INVENTION

[0007] An object of this invention is to provide an improved method and system for reporting the availability of servers in a distributed data processing system.

[0008] Another object of the present invention is to report information on the availability of distributed servers, in a way that decreases the data storage and bandwidth requirements.

[0009] These and other objectives are attained with a method and system for reporting availability of a plurality of servers. The method comprises the steps of for each of said plurality of servers, periodically searching a log file for indication of a crash or shutdown or start, computing the time said crash or shutdown or start occurred, and sending said time and indication as an e-mail to a database on a server. The method further comprises the steps of calculating the duration of downtime for each said crash or said shutdown; for each server, sorting said e-mail by order received; and displaying for each of said plurality of servers, said duration of downtime in said order.

[0010] For example, the sending step includes the step of sending said e-mail to a database on a pre-identified one server; and the sorting step includes the step of sorting said e-mail by order received by said pre-identified one server. Also, with a preferred embodiment, each of the plurality of servers includes an e-mail function; and the sending step includes the step of, each server using its e-mail function to send the time and indication to said database.

[0011] Further benefits and advantages of the invention will become apparent from a consideration of the following detailed description, given with reference to the accompanying drawings, which specify and show preferred embodiments of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012]FIG. 1 illustrates a distributed data processing system.

[0013]FIG. 2 shows one procedure for compiling and reporting statistics on the availability of a group of distributed servers.

[0014]FIG. 3 is a block diagram of a server that may be used as the reporter server in the system of FIG. 1.

[0015]FIG. 4 is a block diagram of a server that may be used as an application server in the system of FIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0016] With reference to FIGS. 1 and 2, a distributed data processing system 10 embodying the present invention comprises a plurality of application server computer systems 12, 14, 16 and a Reporter or Monitor server computer system 20 all interconnected via a network. Each application server provides a service to a set of client user terminals or work stations, and preferably the Reporter server 12 is connected to an administration terminal or work station. It should be noted that FIG. 1 is a very simplified illustration of a distributed data processing system. In practice, system 10 may have many more application servers than the three shown in FIG. 1, and the system may have many additional devices or elements not expressly shown in the Figure.

[0017] In the operation of system 10, the servers collect data about themselves. These data typically include information identifying when each server was started and stopped, and data relating to why the server stopped operating. For example, these data may indicate that operation of a server was stopped by the user, as the result of normal operation of the server, or because the server, or system 10, crashed—that is, as the result of some fault or error by the server or system 10. Other data may also be collected.

[0018] Reporter 20 collects data from servers 12, 14, 16 and issues reports indicating the availability of the servers. Commonly, these reports may be issued on a daily, weekly, or monthly basis. In order to do this, Reporter 20 may maintain data for an extended length of time. As discussed above, with prior art systems, the Reporter obtains the desired information by polling or by replication. In accordance with the present invention, each application server of system 10 sends, or pushes, the desired information to Reporter 20 by electronic mail.

[0019] The use of e-mail to push data to a centralized server is much more efficient, in comparison to data replication or polling, in terms of disk space, network bandwidth and speed. The approach of this invention is also easier for administrators to maintain, and can be deployed easily across multiple domains.

[0020] System 10, preferably, uses Domino server's built-in e-mail capability to send the necessary data to a central server 20. In this way, the storage and bandwidth requirements have significantly decreased, for example by 250 times. This approach also solves a problem that is unsolvable by the polling method: when an outage indicator is received, the polling method cannot determine whether it is the server or it is the network that suffered the outage. The present invention has an added advantage that it can be deployed on multiple domains easily.

[0021] As will be understood by those of ordinary skill in the art, any suitable application servers and Reporter server may be used in the practice of this invention. As an illustrative example, FIG. 3 shows details of one server that may be used as Reporter 20. Server 20 comprises a system random access memory (RAM) 200, a system read only memory (ROM) 210, a central processing unit (CPU) 220, a mass storage device 230 comprising one or more large capacity magnetic disks or similar data recording media, one or more removable storage means 240 such as floppy disk drives, CD ROM drives and the like, a network adaptor 250, a keyboard adaptor 260, a pointing device adaptor 270, and a display adaptor 280, all interconnected via a bus architecture 290. A keyboard 300 is coupled to the bus architecture 290 via the keyboard adaptor 260. Similarly, a pointing device 3 1 0, such as a mouse, touch screen, tablet, tracker ball or the like, is coupled to the bus architecture 290 via the pointing device adaptor 270. Equally, a display, liquid crystal (LCD) panel, or the like, is coupled to the bus architecture 290 via the display adaptor 280. Additionally, the DSM server 10 is coupled to the servers 12, 14, 16 via the network adaptor 250.

[0022] Basic input output system (BIOS) software is stored in the ROM 210 for enabling data communications between the CPU 220, mass storage 230, RAM 200, ROM 210, removable storage 240, and adaptors 250-280 via the bus architecture 290. Stored on the mass storage device 230 is operating system software and application software. Further application software may be loaded into the server 20 via the removable storage 240 or the network adaptor 280. The operating system software enables the server 20 to select and run the application software. The application software stored in the server 20 is capable of receiving and storing incoming e-mails such as Lotus Notes/Domino.

[0023] Also, as an illustrative example, FIG. 4 shows details of a server that may be used as servers 12, 14, 16. The application servers 12, 14, 16 contain software such as Lotus Notes/Domino that is capable of generating system logs and sending out e-mails. This application server comprises a system random access memory (RAM) 700, a system read only memory (ROM) 710, a central processing unit (CPU) 720, a mass storage device 730 comprising one or more large capacity magnetic disks or similar data recording media, one or more removable storage means 740 such as floppy disk drives, CD ROM drives and the like, a network adaptor 750, a keyboard adaptor 760, a pointing device adaptor 770, and a display adaptor 780, all interconnected via a bus architecture 790. A keyboard 800 is coupled to the bus architecture 790 via the keyboard adaptor 760. Similarly, a pointing device 810, such as a mouse, touch screen, tablet, tracker ball or the like, is coupled to the bus architecture 790 via the pointing device adaptor 770. Equally, a display output device 820, such as a cathode ray tube (CRT) display output device (LCD) panel, or the like, is coupled to the bus architecture 790 via the display adaptor 780. Additionally, the each application server 12, 14, 16 is coupled to the Reporter server 20 and to remote client terminals or work stations via the network adaptor 750.

[0024] It will be appreciated preferably the Reporter server is located in terms of system hierarchy, between the application servers 12, 14, 16 and 95 the administration terminal or work station. In operation the Reporter Server operates as a mid-level systems manager. In operation, the application servers 12, 14, 16 record data transfers in which they are involved, such as message and E Mail to or from the connected client terminals or work stations, in log files. The log files maintained by the application servers 12, 14, 16 are directed to the Reporter server 20. The server 20 processes the received log files to reduce the amount of reporting information sent to the administration terminal or work station.

[0025] While it is apparent that the invention herein disclosed is well calculated to fulfill the objects previously stated, it will be appreciated that numerous modifications and embodiments may be devised by those skilled in the art, and it is intended that the appended claims cover all such modifications and embodiments as fall within the true spirit and scope of the present invention. 

1. A method of reporting availability of a plurality of servers, said method comprising the steps of: for each of said plurality of servers, periodically searching a log file for indication of a crash or shutdown or start; computing the time said crash or shutdown or start occurred; sending said time and indication as an e-mail to a database on a server; calculating the duration of downtime for each said crash or said shutdown; for each server, sorting said e-mail by order received; and displaying for each of said plurality of servers, said duration of downtime in said order.
 2. A method according to claim 1, wherein: the sending step includes the step of sending said e-mail to a database on a pre-identified one server; and the sorting step includes the step of sorting said e-mail by order received by said pre-identified one server.
 3. A method according to claim 1, wherein: each of the plurality of servers includes an e-mail function; and the sending step includes the step of, each server using its e-mail function to send the time and indication to said database.
 4. A method according to claim 1, wherein: the sending step includes the step of sending said e-mail to a database one pre-identified one server; and the displaying step includes the step of, said one server, issuing a report showing said durations of downtimes.
 5. A method according to claim 1, wherein: the searching step includes the step of, each of said plurality of servers, searching a log file on said each server for said indication; and the computing step includes the step of each of said plurality of servers computing the time said crash or shutdown or start of said each server occurred.
 6. A system for reporting availability of a plurality of servers, said method comprising: means for periodically searching, for each of said plurality of servers, a log file for indication of a crash or shutdown or start; means for computing the time said crash or shutdown or start occurred; means for sending said time and indication as an e-mail to a database on a server; means for calculating the duration of downtime for each said crash or said shutdown; means for sorting, for each server, said e-mail by order received; and a display means for displaying for each of said plurality of servers, said duration of downtime in said order.
 7. A system according to claim 6, wherein: the sending means includes means for sending said e-mail to a database on a pre-identified one server; and the sorting means includes means for sorting said e-mail by order received by said pre-identified one server.
 8. A system according to claim 6, wherein: each of the plurality of servers includes an e-mail function; and each server using its e-mail function to send the time and indication to said database.
 9. A system according to claim 6, wherein: said e-mail is sent to a database on a pre-identified one server; and said one server includes means for issuing a report showing said durations of downtimes.
 10. A system according to claim 6, wherein: the searching means includes means, on each of said plurality of servers, for searching a log file on said each server for said indication; and the computing means includes means on each of said plurality of servers, for computing the time said crash or shutdown or start of said each server occurred.
 11. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for reporting availability of a plurality of servers, said method steps comprising: for each of said plurality of servers, periodically searching a log file for indication of a crash or shutdown or start; computing the time said crash or shutdown or start occurred; sending said time and indication as an e-mail to a database on a server; calculating the duration of downtime for each said crash or said shutdown; for each server, sorting said e-mail by order received; and displaying for each of said plurality of servers, said duration of downtime in said order.
 12. A program storage device according to claim 11, wherein: the sending step includes the step of sending said e-mail to a database on a pre-identified one server; and the sorting step includes the step of sorting said e-mail by order received by said pre-identified one server.
 13. A program storage device according to claim 11, wherein: each of the plurality of servers includes an e-mail function; and the sending step includes the step of, each server using its e-mail function to send the time and indiction to said database.
 14. A program storage device according to claim 11, wherein: the sending step includes the step of sending said e-mail to a database on e pre-identified one server; and the displaying step includes the step of, said one server, issuing a report showing said durations of downtimes.
 15. A program storage device according to claim 11, wherein: the searching step includes the step of, each of said plurality of servers, searching a log file on said each server for said indication; and the computing step includes the step of each of said plurality of servers computing the time said crash or shutdown or start of said each server occurred.
 16. A method of reporting availability of a plurality of servers, said method comprising the steps of: for each of said plurality of servers, periodically searching a log file for an indication of the occurrence of each of a group of defined conditions for the server; computing the time of occurrence of the group of defined conditions; sending said time and said indication, as electronic mail, to a database on a reporter server; for each of a subset of the group of defined conditions, calculating the duration of time the server is in each of said subset of conditions; for each server, sorting the electronic mail by order received by the reporter server; and for each of the plurality of servers, displaying in said order, the duration of time the server is on said subset of conditions.
 17. A method according to claim 16, wherein the group of conditions includes a crash of the server, a shutdown of the server, and a start of the server. 